REMARKS 

, . , . v - n to add the J h m 

engines, and to further clarity that each of the processing engines o connected to execute 

i v ,< -egnxi - i .w i resn^nxe to \ ^ it.-! iioi the e> v 'iu' > i multiple 

i u< t Kk " - ..it- \pp k « i tiu nv'ux ulv i v < 'i i an and U to rextU. 

it, ii seret nended claim I Ukr these amend s nx 

5 " s ^ , < ' i. s 1:3 26 and s i ire independent. 

seexa c ectc 5 1 8 20 21 and 24 under 35 1 M eP^ - v< , 

unpatentable over U.S. Patent No. 6,295,600 to Parady in view oi'U.S. Patent No. 6,50~8o2 to 

Joy. The examiner further rejected claims 22 and 23 under 35 U.S.C. § 103(a) as being 

unpatentable over Parady in view of Joy. Additionally, the examiner rejected claim 25 under 35 

I S C 03(a) as bemg unpatentable over Parady.i'a view of Joy, and further in view of IKS. 

j\»<:n No. 6,085,215 in- Ramakrisbnan. The Examiner also rejected claims 26-35 under 35 

I ^t » ^M' ^ ,ilu\t Irnth vi'u< sr ef !\aae\ <M ^ 

<peei v IK sM?h-e,spec < sup leant s independent claim 1 3 he-.xaenn Maiu 

2 Pareay taught the invention ^ t < bs claimed including a data 
,^ - ,< s Xt s* <- v - i t , r . T ox .~>roi x; of ; re^- 
XX . + 'n 1ie< ; i o.^ t 
v "u )tt-'p^e r os>e? t 
add. ess registers itseo go; 3 - 58X35;, the at least one instruction 

so roe 

shared by the . e processing engine <- a see figs. X2 and coh 3 : 
line 3-cel 4. tins 62}.. 

<ecut - thread ha * 

t ,\v,s - u ^ fv tin < * <n< ^ 1 ~ 

to issue the request to the shared resource (e.g., see coh 4, line 3-coi. 5, 
line 5}; and 

„,, . „ is e t i h n > 

generated in response in the request to the shared resource (e.g., see 
coi 4, iines 52S2). 

5. Parady did not expressfy detail individual one of execution engines 
eluding corres.pt - > for executlo Joy 

er f dividual ones c ultiihre ded executioi en toes 
K>4,9 - Q b e e ee cc tes 26-40 and go 

lines 8-65)leach multithreaded pipeline has its own thread selectable fib- 
Hop substitution logic used to create vertically multithreaded 
functionality}. 
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6. Ons of ordinary ski!! in the DP art would have been motivated to 
combine the teachings o? Parady and Joy. Parady taught processor the 
processed multiple threads and switched on 3 long latency event such as 
a cache miss to allow processing to continue on long latency even-si's. g.. 
see col. 2, isnes 25-42) Joy taught the improvement of efficiency and 

>ara!ie >y it adi it d steps . vertic* 

multithreading end horizontal multithreading (e.g., see cot. 3, lines 3-13), 
Therefore one of ordinary skill would have been motivated to Incorporate 
e Joy vertica a! mul «. e v 5 

- 0 threads to increase e^'ic 

breads would be or eased timely manner when a long latency 
even occurred such as a cache miss. 

As to the further limitation of claim 13, Parady taught executing 
one or more additional instructions of the first thread, after executing the 
a; least one instruction head or store} to issue the request > the shared 

( l A 1 » O "Ml 'llP-i'f 

non blocking loads, which allow further instructions in the same thread 
alter the load to be executed, with the system remaining and executing in 

. „ < U , >! *■ < t 

-f* further 

e , , v. ^ ore *c ccmoiete and 

at that later point or later thread swapping is allowed]. (Final Action, 
page 2-4} 



\ tml cat I v atiy the 

,A)tc: o 1 expressed 0 paragraphs 2-4 of the Final Aehon. 

\ppim.tr.t - a dAamdem claim 1 ^ recites "{a} t U d l < f s at a ! !t mu, t 
v f - thjf tie 

each configured to execute separate groups of threads, tr.do ntmP ones of the ongioes tooiiuhng 

<. ! K v rf v s u ( s i hK ^ t \ia.'!0 Pi St 

teast. one auP.raetton of a first thread hav ing a first program counter, the at. least one instruction 
ssits quest % t 1 

i U 1 V < 1 . * < V I U'tl 

espouse hereques the e e ired le multiple nui'J -threaded 
puv \ won w " i\.-x xylo: :t;d, 1 p r 1 o t s j en wo.. - , tch | At e> separate groups 
W > v '-AO, A a \ ' l.t t ^ it s< 0 >„ie _ -t >: A A ^ACs^n 

issues a request to a resource shared by the multiple processing engines and is swapped out, the 

\ (. „t. it ' * -a i r j 0 <. s 

made; n ended p"ve-\tng engines. 
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instructions io a decode unit 14. The instmction cache can receive its 
instructions frosts a prefetch unit 16, which either receives instructions frnat 
branch unit 18 or provides a virtual address to an instruction TLB 
(translation look-aside buffer) 20, which then causes the instructions to he 
feu ed iron an off-chip at i ough . > 5 > >> f n sys sn Interface 32. 
The abstractions from the off-chip cache are provided to a pro-decode anit 
24 so provide certain information, such as whether it is a branch 
instruction, to instraction cache 82. 

instnsetions from decode miit 14 are provided to an instrtseihm buffer 26. 

ti * • < 2S f> { > el x «-ovid 

four decoded instructions sit a time along a bus 30, each instruction being 
provided to one of eight functional units 32-46. The dispatch anil will 
dispatch four such instructions each cycle, subject to checking for data 
dependencies sod availability of the proper functional aatt. 

The first three functional units, the load/store «nit 32 and the two integer 
ALU units 34 and 3d. share a set of integer registers 48. Floating-paint 
registers 50 are shared by floating point units 38, 40 and 42 and graphical 
nubs -44 and 40. Each of the integer and floating point functional oust 
groups have a corresponding completion anil. 52 and 54, respectively. The 
microprocessor also includes an on-chip data cache 56 and a dots TLB 58. 

HG, 2 is a block diagram of a chipset including processor 10 of FIG, 1. Also 
shown arc LI cache tags memory 80, and f.2 cache data memory 82. la 
addition, a data buffer 84 for connecting to the system data bus Bo is shown. 
In the example shown, a Id-bit address bus 88 connects between processor 
10 and tag memory 80. with the tag data being provided oo a 28-bit tug data 
bits M. An .18-bit address bus 00 connects to the data cache 82, with a 144 
is v ! write cache dstta. 

FIG/3 illustrates portions of the processor of FIG. 1 modified to support 
the present invention. As shown, a decode unit 14 is the same as io Fit / f . 

» t in > 1 j S4 8 a 

i .»« [ ' > t< Hih , \ 'h i . > < u <o 

from a particular thread are provided to dispatch unit 28. which then 
provides them to instruction units 41, which include- the multiple pipelines 
32-46 shown in FIG. h (emphasis added, col. 3, lines 7-52) 

j is n !/ - jpp.rrif-s k j ! - .i ■ ! ^\ t ; v --m c;:ei; ; e a \.<"gh taoap of 

threads. As such, rfcc available resources art used onh b\ that one processing engine, and are 

separate groups <«f tha-ads. Accordingly. Parody does not disclose or suggest at least the feature 
- c t l\ pr cess ig c ginc 5 it tin a p oot so N ^ 1 „ lulttple n ! - Kt v k'd 
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t!K" of die one; ses Uik'^dsO^ v i\ -pond a .nhuc^ v -e-eet i -s- ifv .orrespondirig set of 

1 v lU v\ CV. (. i <. 'v i K! U v 

\1 ^ v k su-ee Oarad\ spi s s t base resource.- thai arc -ha;ed h\ ntpv 

su< i sbarcv i ee < 1 H •> * s i e.\^ 5 c e detect! n 

of a - ! e ) 1 l i v. i » t l\ ^ 4 i s 1 d <. s s - t v 5 

resources exist Vccordingh , Farad v fails to disclose o; suggest at east the feature 'executing at 
lea ou ^ii. vi i ithstt < fust ffoyaiu coujtei, iV a east muUi 
u ! \ at v ;.i ^.ivauc'-oo i s^bv a request to a resource i fo v -oi.lople nsuks- 
s _ » v n n] i Mo the ' i i » > k uuot of 

V v y , I I IM i L 1 V i 

process; nget nes $ v< ka n tpj 

As for the examiner's contentions thai Parady's elements 3.2, 34, 36, 38, 40, 42. 44 and 
46 correspond to multiple processing eng nes as described in Parady, tin units 32-46 are 
+ i>- .A, us \ u ^ iA.s^ul<t dnd kk lit v n f fuin^r 

ALU units 34 and 36, share a set of integer registers 48. Floating-point registers 50 are shared by 

e s < , S 1 > ut t\ 14 and 4o (I 0 v\ - jus N v * 

1 bus, the -sauUkva; units V! 34, "a. -Ok 43 44 and 4o die niodales \%- f U n a p;oet.ssos 
configured <o per to;-. a opeeiik functions in the course of proeossi ig nstrut tions received by the 

cess r (eg 5 tc e> -> I presumably perform A >peratio egca 

operands spvvtiVd m tru seccn cu m>=!riK>>as). These funetion.il nous per lorn t -.-peiauons only 
srclati < hi d x . igos be single processo in fims !a. c 32 M 

a)!,f -v oo.kUc v' t*e gioap^ i aieids 

I t,x i do clatioj smgje 

i.'u^ta'U e 4 i ve-so 5 > <s<\i ,Ok : lo leWi. * i 

n c s | y --poet ! an) . i ip 0 ps sace 1 > i ! v 

description at col. 10, line 41 to col. \ 3 5 line 7). and a thread switch logic 61 0 that at least in part 
t. ! »e v a t > . ^ pi n s I tt 
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thread switchhtg logic soc.h as the pnisc-hascd high-speed lisp- flop 400 to 
improve speed of thread switching. The prdse-based high-speed flip-flop 400 
enables virtual!} nstttntaneons switching between threads. saving of tin 
sac bines u stalled thi td tnd i n s !«s«' tti 

I < nds range, context switch! 

. <>«?■«> s ( ii i >< s t < >i flip flop 

400. The thread switch logic 6 lb receive p!u ditx < np signals that 
evoke a context switch and thread switch. la an illustrative processor, input 
terminals to the thread switch logic 610 Include an LI Jo ad miss terminal, 
an I, ^instruction jntss iersniaai, an instructing buffer .eniptv terminal a 
inreadjJriority terminal, an MI mode terminal, an external Jnt*rrapt 
terminal, and an internal interrupt terminal. The thread switch logic hit? 
generates a thread identifier (Til)) signal based on signals to She input 
terminals. The thread switch logic 610 generates the 1 H> signal with a 
thread switch delay or overhead of one processor cycle. sjoy, col. 15. line 
52, to «ol. U, line 5} 



in other intptemeatatioHS, tt«s thread switch logic 6K> implement* one or 
more of several thread- switching methods. ... 

\ third thread it< g ration is sin i dig si ghb? sent ilet 
thread-switching in which a thread switch decision is selectively 
programmed, based on one or more signals. In one example so intelligent 
global scheduler uses signals seed as- id! an I..1 data cache- miss stall signal. 

(2 i 1 i I i _ •» (4) an 

SI v !> i iv!< j S i ) O (i 

priority signal, (7) a thread timer signal, {$) an interrupt signal, or other 
snwrces of triggering. In some embodiments, the thread select signal is 
broadcast as fast as possible, similar to a clock tree distribution. In some 
systems, a processor derives, a thread select signal thai is applied to the flip- 
flops by overloading a scan enable (BE) signal of a scannabie Slip-flop. {Joy, 
FI€. o, and col. 16. line 06, to cos. 17, hoc 32} 

e sign c v 1 read switching in J s processes ire II 
local to dtc mulii-thrcdded prvccs.^oi in >\ hieh flic switch logic M0 resides, i or example, the 
ttv.^'d.i v, o <>uv e m w ne nstrdetstn. chc I 1 < lvr \ ,o O ' 

40-44.) 

j Otw Ml. est I', 1 . I ! vt-sOi 

(see for cxamp! ( c escriptioo at col, 1 %, lines 39 to c< 1 k ^ ! * 

> point ck N v h\ ffct ed K s ale 

response to requests nj resources s ared by a! pressors R,dhei thre ds\s,spp poxlhtdua 
vertically 1 eaded process s qiends \ N on he signals generated oeaih * tcrcspeeii e 
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of "executing at foa;4 one irjstn;o:;on ;d a rlr^t tiuv,id ha\ una first pn<gsim countee f it 'cast 

O V . V 5 V. t T S O i > it S \ I Si V S V ^» s 

multiple multi -threaded processing engines; , . .; and swapping execution to the first thread after 
't s v. h i ' i ' <_ - i <. )( 

hii „ v n p v > 

kuiu v ■> d> s t 1 s <i sel sest) uggcMs lor-e-.'i i 

the feature ,\s. >e as v« n v o stuo'jon of a first tin tad It e. see a h.-a pteetae. counter. 

Li SOI lis 1 t. msOlVlO ! tO IssUv 1 t|i sOUiU 

shared by the rmdiiple \r,' ,,da processing engines; . , .; and swapping execution to the first 

v i s v v 1 k s»t t l^o. . s , w 1 v tx 

e, ' t , s u j| \ xmden s f ink 

over the cited art. 

riwin- * is o d ' > v,pviu { i i ndqieil i Jauo * see ik «. i v. 

for at icu-4 K -> s i 3. 

As noted, the examiner rejected claims 26-35 under 35 I add ^ihfoo as being 
i ! „ N k \ i s~ - a . < P 1 tjs md e\ 

v. ^ V i'' 1 ' ivO . i 'U 1 »l ^' sis {f , , „ , * 

thread having a tlrst program counter, die at least one instruction including at least one 
i s _t j s< i resource sharec n the muiu !e n s 1 

rgmes i ' ^ o t s i v ! ^ 

response ft the n-uncst the icm.uicc shared p. hi i ittn nuf eMirva.k-d pexcssanp 

<. M IT tO * V | < i f I v. s.1 s! 

s < i s v 3 i _ v u > si i , > 

Raoiakiishnan describes method and apparatus for avoiding receive dvdock and transmit 
t os i s ! i<li e> a w -on >< i * 1 ' * < 

u s , v <ar r-e i < Mm 'b iph ^ u-s rte*' ud 

t . s „ s ! '! t, a.ss p K\ '1 u\ s s s i 

which i ' > s e\cs a bonk t'm be re being subject to precnip b 
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i i v M i IS M! ih l a i s v 

i v. ' h J ^ i n ! s v 

dui < a special task to svhiel nfroS 1 
t s tcessing thread. 11 *ed *< r 

roaad-rebia selection device vdti ki m ledge of f of he \n eessiag threads 
and access to the real time thread flags 50, Whets one processing thread is 
completed, the scheduler 46 chooses the next one ia fans having a flag set fa 
indicate thai processing is needed. PIG. 5 shows the basic steps performed 
by ine core thread scheduler 46- After return from a completed processing 
thread, indicated at 5o. the scheduler 46 selects the next thread that needs 
to roe (block. 5SK ami yields control to the aext thread (block 68). The 
scheduler 46 includes at least one time siot in its roand robin system for the 
general purpose domain 42. (col. 9, lines <>-22) 

' s x , i s i <. s n k < ■ ) tpA 

* i ] t d ^ i s d-iohn ! d -.dec una P k 

! ^ _ ! H V s I O O 

.septate -A'ours :t.nt*at!s ani swhero Jots R«imaknshnjn dtwdH.- I at i^s i.j oad: * 

n\ f ] c u >~ . Jvnuj x 2 <an ih^hi u p us ikiM) i i o-v i so does not 

dc - u oid hi a s > i S < i 

U O v. t U I led fls I Pk 0 i 

tvs A. corj-mjh. Rjn-aknsrauP i ck to di>dos^ or mu>s.w a k\i<; tke V,t,t"c d\ui>L 
- hvoi no ,MJ,a . - a * Nf s wt A ii.t J first p-u<r e\-s anuria. k< 0 \v 1 one ir r k no 
! ssuc a request to a rcsouicc s ul f\\ fa i 

readed oces engines ! swap exec on to the -> ( ^ c e \ i sigtu 

v ^. }, l > 4 v OU . lOH. no . > v kki. 

Bc\ a\ ' s > ' j'. a; v , s , I >>v - v io tn 

mbina ! ea tote ito exec e at 1 "ot >k Ui f kn-a thread has in \ 
>>n sf\k inch/moc at Wast one in to tssuo a ivqae^ 

, v. ^ ! V « ! ! e I p S ) K. <. N 

to t' - ■> ksea : k JuoaOs i o;k Moral .ui. ' s tp is. s V K c ^ - >k iee^urce 

^ e i \ ^ I ! t' 1 advt o ee-Jsi % t ^'incs ip| o depesok 

a >1 . ^ v "\ov. . a'oMi', 



> dm,- 1 'vr: irs-lcp i t ' s J i ! v. u ■) ^ 

:he vnnereasOi - ependvm chum > i In ^ ' A d>vv\\M\ veoetve sv - an 

h \k\ c I hkI Ao hjcv ><i- s theexcm . t been 

addressed. 

In view of £ >p > 1 pect fully submi nU the applk 

to\ e r d 1 t ; > v Pa ii<. U . \ . ^ 

convenience. 

does > can tb < N p K n i ul(^ met comments o Ik ! n - 

tit i • i ^ t , M,t a v amis > oes s o ru t m K $r t » i ^ ^im.i 
patent ability of those claims and other claims, or (c) amended or canceled a claim does not me 
hat the a] edes any of the m dons c d s <*< t ; 

claims, 

n L <v p s k uif ^ 

docket numbe: $Uo*:kjiwc 
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